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CENTRAL FAX CENTER 

MAR 2 3 2007 

REMARKS /ARGUMENTS 

In view of the foregoing amendments and the 
following remarks, the applicants respectfully submit 
that the pending claims comply with 35 U.S.C. § 112, 
comply with 35 U.S.C. § 101, are not anticipated under 35 
U.S.C. § 102 and are not rendered obvious under 35 U.S.C. 
§ 103. Accordingly, it is believed that this application 
is in condition for allowance. If/ however, the Examiner 
believes that there are any unresolved issues, or 
believes that some or all of the claims are not in 
condition for allowance, the applicants respectfully 
request that the Examiner contact the undersigned to 
schedule a telephone Examiner Interview before any 
further actions on the merits . 

The applicants will now address each of the issues 
raised in the outstanding Office Action. 

Objections 

Claim 51 is objected to because it depends on 
itself. Claim 51 has been amended to depend from claim 
50. Accordingly, this objection should be withdrawn. 

Rejections under 35 U.S.C. S 112 

Claims 46, 47, 49 and 52 are rejected under 35 
U^S.C. § 112, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim 
the subject matter which applicants regard as the 
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invention. The applicants respectfully request that the 
Examiner reconsider and withdraw this ground of rejection 
in view of the following. 

Claim 46 has been amended to provide proper 
antecedent basis for various terms and to clarify the 
claim. Accordingly, the applicants respectfully request 
that the Examiner reconsider and withdraw this rejection. 

The Examiner found that claim 47 recited various 
terms that lacked proper antecedent basis. Specifically, 
the Examiner found that "the candidate search result" 
recited on lines 6-7 and 11 lacks sufficient antecedent 
basis. This claim has been amended to provide proper 
antecedent basis. It has also been amended to clarify 
what was meant by "that" in line 11, and "the two 
candidate search results." Accordingly, the applicants 
respectfully request that the Examiner reconsider and 
withdraw this rejection. 

The Examiner found claim 49 to recite a negative 
limitation since it recites the word "not." First, MPEP 
2173. 05 (i) provides, "there is nothing inherently 
ambiguous or uncertain about a negative limitation. So 
long as the boundaries of the patent protection sought 
are set forth definitely, albeit negatively, the claim 
complies with the requirements of 35 U.S.C. 112, second 
paragraph." The boundaries of the claim are clear. 
Furthermore, as used in the claim, the word "not" does 
not recite a negative limitation. Specifically, the 
claim recites "determining whether or not the two 
documents are near-duplicate documents " and "determining 
whether or not any one of the at least two fingerprints 
of a first of the two documents matches any one of the at 
least two fingerprints of a second of the two documents". 
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In both instances, there is a determination act which is 
not a negative limitation. Frankly, the applicants 
cannot see how "determining whether or not ~. n is any less 
proper than "determination whether... . " Accordingly, the 
applicants respectfully request that the Examiner 
reconsider and withdraw this rejection. 

Finally, the Examiner found claim 52 to recite a 
negative limitation since it recites the word "not." 
Claim 52 has been amended to recite "wherein at least 
some contiguous elements in a document are not contiguous 
elements of a list." This is not a negative limitation 
because it further defines features of at least some 
contiguous elements in a document. Accordingly, the 
applicants respectfully request that the Examiner 
reconsider and withdraw this rejection. 

Rejections under- 35 U.S.C» § 101 

Claims 46-67 are rejected under 35 U.S.C. § 101 
because the claimed invention is purportedly directed to 
non-statutory subject matter. The applicants 
respectfully request that the Examiner reconsider and 
withdraw this ground of rejection in view of the 
following. 

The applicants respectfully disagree with the 
Examiner's conclusion since the claims produce concrete, 
useful and tangible results. Specifically, as the 
specification states: 

As can be appreciated from 
the foregoing, improved near-duplicate 
detection techniques are disclosed. 
These near-duplicate detection 
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techniques are robust, and reduce 
processing and storage requirements. 
Such r&duc&d processing and storage 
requirements is particularly important 
when processing large document 
collections . 

The near-duplicate detection 
techniques have a number of important 
practical applications . In the context 
of a search engine for example, these 
techniques can be used during a 
crawling operation to speed-up the 
crawling and to save bandwidth by not 
crawling near-duplicate Web pages or 
sites, as determined from documents 
uncovered in a previous crawl. 
Further, by reducing the number of Web 
pages or sites crawled, these 
techniques can be used to reduce 
storage requirements of a repository, 
ft n^ therefore, other downstream stored 
data structures. These techniques can 
instead be used later, in response to a 
query, in which case a user is not 
annoyed with near-duplicate search 
results. These techniques may also he 
used to "fix" broken links. That is, 
if a document (e.g., a Web page) 
doesn't exist {at a particular location 
or URL) anymore, a link to a 
near-duplicate page can be provided. 
[Emphasis added.] 

Page 43, line 13 through page 44, line 3. Thus, the 
invention has real-world value. 



Independent method claim 46 has been amended to 
further recite that a filtered set of search results 
including only those of the plurality of candidate search 
results that have not been rejected is defined. 
Independent claim 47 has been similarly amended. As can 
be appreciated from the foregoing, these claims now 
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clearly generate a concrete, useful and tangible result. 
Consequently, claims 46 and 47 now recite statutory 
subject matter. 

Independent claim 49 has been amended to further 
recite using the determination of whether or not the two 
documents are near-duplicates in at least one of (A) an 
act of serving search results corresponding to documents, 
(B) an act of crawling documents, (C) an act of indexing 
documents, and (D) an act of fixing a broken link to at 
least one of the two documents . These useful, concrete 
and tangible results are described in the specification 
at page 18, lines 14-31. As can be appreciated from the 
foregoing, these claims now clearly generate a concrete, 
useful and tangible result. Thus, claim 49 now recites 
statutory subject matter. 

Independent claims 48, 50, 52, 53 and 67 recite 
machine-readable medium storing various data structures. 
As was the case in In re. Lowry , 32 U.S.P.Q.2d 1031 (Fed. 
Cir. 1994) , these claims are more than a mere abstraction 

the claimed data structures are specific structural 
elements in memory. Further, they provide tangible 
benefits. Specifically, embodiments consistent with the 
present invention may detect near-duplicate documents by 

(i) for each document, generating fingerprints, (ii) 
preprocessing (optionally) the fingerprints to eliminate 
those that only occur in one document, and (iii) 
determining near-duplicate documents based on the 

(remaining) fingerprints. The act of generating 

f ingerprints for each document may be effected by (i) 

extracting parts {e.g., words) from the documents, (ii) 
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hashing each of the extracted parts to determine which of 
a predetermined number of lists is to be populated with a 
given part, and (iii) for each of the lists, generating a 
fingerprint. The claims recite the lists formed from 
elements of a document. Fingerprints generated from 
these lists allow duplicate documents to be found using 
less time, processing and memory than at least one other 
technique. Hence, the claimed data structures clearly 
have a practical application. Thus, these claims are 
functional material recorded on a computer-readable 
medium. The Patent Office has recently instructed: 

When functional descriptive 
material is recorded on some 
computer-readable medium it 
becomes structurally and 
functionally interrelated to the 
medium and will be statutory in 
most cases since use of technology 
permits the function of the 
descriptive material to be 
realized. 

"Interim Guidelines for Examination of Patent 
Applications for Patent Subject Matter Eligibility", OG 
Notices: 22 November 2005 , Annex IV. 

As can be appreciated from the foregoing, 
independent claims 48, 50, 52, 53 and 67 recite statutory 
subject matter. Since claims 55-57 depend from claim 48, 
since claims 51 and 58-60 depend from claim 50, since 
claims 61-63 depend from claim 52, and since claims 54 
and 64^66 depend from claim 53, these claims also recite 
statutory subject matter. 

Rejections under 35 U.S»C. § 102 
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Claim 49 is rejected under U.S.C. § 102(e) as being 
anticipated by U.S. Patent No. 6,119,124 ("the Broder 
patent") . The applicants respectfully request that the 
Examiner reconsider and withdraw this ground of rejection 
in view of the following. 

Claim 49 is not anticipated by the Broder patent 
because the Broder patent does not teach concluding that 
two documents are near-duplicates if any one fingerprint 
of one of the documents matches any one fingerprint of 
the other document, where each of the documents has at 
least two fingerprints. 

The Examiner contends that the Broder patent teaches 
this feature, citing column 10, lines 27-29. (See Paper 
No. 20061013, pages 7 and 8.) However, this section of 
the Broder patent concerns reducing computational 
workload by eliminating (1) identical documents and (2) 
equivalent documents such that a cluster of documents 
does not include identical or equivalent documents. The 
Broder patent does so by (1) fingerprinting the entire 
document (for purposes of identifying identical 
documents) and (2) fingerprinting a canonical form of the 
document and/or a set of shingles of a document (for 
purposes of identifying equivalent documents) so that if 
two documents with identical fingerprints are 
encountered, only one is used in the clustering process. 
After clustering is completed, the eliminated documents 
are added back in. (See, e.g., column 10, lines 12-30.) 

As can be appreciated from the foregoing, the 
fingerprints of entire documents (or of a canonical form 
of a document or of a set of shingles of a document) are 
not used to conclude whether or not two documents are 
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near duplicates. Rather, they are used in an 
optimization technique applied during clustering. (See, 
e.g., column 9, lines 59 and 60.) More importantly, 
claim 49 recites that each of the documents includes at 
least two fingerprints. In the cited portion of the 
Broder patent, single fingerprints, representative of 
each document, are used to find identical (or 
lexically-equivalent or shingle-equivalent) documents. 
Thus, claim 49 is not anticipated by the Broder patent 
for at least the foregoing reasons. 

Claims 50-54, 58-60, 61-63 and 64-66 are rejected 
under 35 U.S. C. § 102(b) as being anticipated by U.S. 
Patent No. 5,850,490 ("the Johnson patent"). The 
applicants respectfully request that the Examiner 
reconsider and withdraw this ground of rejection in view 
of the following. 

The Johnson patent is first introduced. The Johnson 
patent concerns storing and retrieving image information 
from scanned documents. As shown in Figure 5, different 
types of documents (different "document classes") can 
have different types of information (different "segment 
classes") at different positions. Referring to column 
9, lines 24-27, a "class of documents'' is a category 
(such as, for example, article, pleading, memorandum, 
etc.) to which a document may belong. Referring to 
column 9, lines 32-36, a "class of document segments" is 
a category (such as, for example, title, author, 
abstract, body, journal, date, sender, recipient, 
subject, plaintiff, defendant, court, etc.) to which a 
document segment may belong. Referring back to Figure 5, 
a schema 120 may be used to associate certain segment 
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classes 124, 126 (at certain positions 130, 134), with a 
certain document class 122 . 

Referring to Figures 10-12, a user may manually mark 
forms, to be scanned along with the document, in order to 
specify types of documents or document segments to be 
expected, as well as expected positions of the document 
segments. The scanned document image information may be 
stored as segment data, and may be associated with a 
document identifier and segment classes in either a list, 
as shown in Figure 15, or a table as shown in Figure 16. 

Referring to Figure 16, image segments from a 
document may be stored in a table 450. Each table 450 
pertains to a particular document class. Each row 452 of 
the table 450 corresponds to a particular scanned 
document belonging to the document particular document 
class. In each row, segment data is provided under a 
column corresponding to a segment class (of the 
particular document class) . In the example of Figure 16, 
image data of a title segment, an author segment, a text 
segment, etc., is provided. 

Information stored in tables, such as 450 of Figure 
16, can be searched by first applying search constraints 
to the class of documents to find one or more tables, and 
then applying search constraints to class of document 
segments to find one more columns of the one or more 
tables. Finally, search constraints can be applied to 
the content of the one or more columns of the one or more 
tables. Upon finding an entry with a field that includes 
segment data satisfying all of the constraints, the 
corresponding document identifier can be obtained and 
used to access the document's image data. 
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As can be appreciated from the foregoing, the 
entries in the columns of the table 450 of Figure 16 
include image segments of scanned documents, which are 
arranged based on segment class. The segment class 
columns contain different information from different 
documents. That is, each row of the table corresponds to 
a different identified document. 

Independent claims 50, 52 and 53 are not anticipated 
by the Johnson patent at least because the Johnson patent 
does not teach a plurality of lists, each of the 
plurality of lists containing elements of a document 
identified by the document identifier stored in the first 
field. Rather, if it is the Examiner's position that the 
columns of the table 450 in the Johnson patent correspond 
to the claimed list, then each of these lists does not 
contain elements from a document. Rather, they contain 
elements from different documents. 

Therefore, claims 50, 52 and 53 are not anticipated 
by the Johnson patent for at least this reason. Since 
claims 51 and 58-60 depend from claim 50, since claims 
61-63 depend from claim 52, and since claims 54 and 64-66 
depend from claim 53, these claims are similarly not 
anticipated by the Johnson patent. 

Rejections under 35 U.S.C. 5 103 

Claim 48 is rejected under 35 U.S.C. § 103(a) as 
being unpatentable over the Johnson patent, in view of 
U.S. Patent No. 6,381,601 ("the Fujiwara patent"). The 
applicants respectfully request that the Examiner 
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reconsider and withdraw this ground of rejection in view 
of the following. 

Claim 48 is not rendered obvious by the Fujiwara and 
Johnson patents at least because (1) these patents 
neither teach, nor suggest, a plurality of lists, each of 
the plurality of lists containing elements of a document 
identified by the document identifier stored in the first 
field, wherein a hash function is used to hash each of 
the elements in order to determine which one of the 
plurality of lists that each of the elements will be 
contained in, and (2) one skilled in the art would not 
have been motivated to combine these patents as proposed 
by the Examiner. 

First, as discussed above with reference to claims 
50, 52 and 53, the columns of table 450 of the Johnson 
patent do not teach a plurality of listB, each of the 
plurality of lists containing elements of a document 
identified by the document identifier stored in the first 
field. The Examiner does not rely on the Fujiwara patent 
to compensate for this deficiency. Thus, the rejection 
of claim 48 is improper for at least this reason. 

Second and more importantly, one skilled in the art 
would not have been motivated to combine the references 
as proposed by the Examiner. The Examiner concedes that 
the Johnson patent does to teach using a hash function to 
hash each of the elements (of a document stored in a 
table) to determine which of a plurality of lists that 
each of the elements will be contained in. In an attempt 
to compensate for this admitted deficiency in the Johnson 
patent, the Examiner relies on the Fujiwara patent as 
teaching using a hash function to hash elements in order 
to determine which of a plurality of lists each element 
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will be contained in. The Examiner then concludes, 
incorrectly, that it would have been obvious to one 
skilled in the art at the time of the invention to use a 
hash function to hash each of the elements in order to 
determine which of a plurality of lists that each element 
will be contained in "to reduce or remove duplicate 
elements by using a hash function.* Paper No. 20061013, 
page 11. The applicants strongly disagree. 

As mentioned above, in the table 450 of Figure 16 of 
the Johnson patent, each of the columns is associated 
with a different segment class. Recall that "class of 
document segments" is a category (such as, for example, 
title, author, abstract, body, journal, date, sender, 
recipient, subject, plaintiff, defendant, court, etc.) to 
which a document segment may belong. Thus, in the 
Johnson patent document segments are placed in columns of 
the table 450 based on their segment class. The search 
in Johnson utilizes this arrangement. Thus, it is 
believed that arranging documents segments into different 
columns of the table on the basis of a hashing function 
rather than on the basis of segment class would destroy 
important functionality in the Johnson patent. 
Therefore, one skilled in the art would not have been 
motivated to modify the Johnson patent as proposed by the 
Examiner. Consequently, claim 48 is not rendered obvious 
by the Johnson and Fujiwara patents for at least this 
additional reason. 

Claims 48, 55-57 and 67 are rejected 35 U.S.C. 
§ 103(a) as being unpatentable over U.S. Patent No. 
6,360,215 ("the Judd patent"), in view of the Fujiwara 
patent. The applicants respectfully request that the 

-21- 
page 27/37 ■ RCVD AT 3/23/2007 3:26:39 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF-6/30 - DNIS:2738300 • CSID:17325429071 • DURATION (mm-ss):06-38 



03/23/2007 14:44 FAX 17325429071 



©028/037 



Examiner reconsider and withdraw this ground of rejection 
in view of the following. 

As discussed by the Examiner, the Judd patent does 
include a "document index- record. As described in the 
Judd patent, the document index maps document identifiers 
to specific document location identifiers (e.g., URLs), 
or to other information that may be displayed after a 
search, such as a document title or abstract. (See, 
e.g., column 7, lines 5-9.) However, the discussion of 
using a hash function in column 7, lines 65 through 
column 8, line 9 in the Judd patent pertains to hashing 
words of a "word index. n As described in the Judd 
patent, the word index includes an alphabetical list of 
words, each word being mapped to one or more document 
identifiers which identify documents including the word. 
(See, e.g., column 7, lines 1-5.) Thus, as the Examiner 
concedes, the Judd patent does not teach using a hash 
function to determine which of a plurality of lists that 
each of a number of document elements will be contained 
in. (See Paper No. 20061013, page 12.) 

The Examiner alleges that column 7, lines 45-50 of 
the Judd patent teach a plurality of -lists" containing 
elements of a document (e.g., title, document summary, 
etc.) identified by a document identifier. (See Paper 
No. 20061013, page 11.) The cited section of the Judd 
patent states that each record of a document index may 
include a document identifier, a hashed value of the 
contents of the text of the document, and values of 
properties of the document. The applicants frankly fail 
to appreciate how any of these elements can be 
characterized as l±Bts. If it is the Examiner's position 
that columns of the document index correspond to lists, 
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then these lists are not associated with a record 
associated with an identified document as claimed. 

The Examiner relies on the Fujiwara patent as 
teaching the use of a hash function to hash elements in 
order to determine which of a plurality of lists that 
each element will be contained in, citing column 2, lines 
57-62, column 4, lines 47-62 and column 5, line 50 
through column 6, line 10. {See Paper No- 20061013, page 
12.) The Examiner then concludes that it would have been 
obvious to one of ordinary skill in the art at the time 
of the invention to use a hash function to hash each of 
the elements in order to determine which of a plurality 
of liet that each element will be contained in because 
this would somehow reduce or remove duplicate elements. 
(See Paper No. 20061013, page 12.) The applicants 
respectfully disagree. 

As best understood by the applicants, the Fujiwara 
patent (1) determines plural database records having the 
same value for a particular column or particular columns, 
and (2) deletes duplicate records. (See, e.g., column 1, 
lines 5-9.) The cited operation of the Fujiwara patent 
differs substantially from the claimed invention. 
Specifically, the Fujiwara patent states: 

. . . the step for determining 
possibility [or not for existence of 
at least another record having the 
identical value to that of at least a 
part of one or more predetermined 
columns of the records for each of 
plural records of the record list 
included in the database] includes 
the steps for generating a hash value 
for each of plural records of record 
list included in the database with 
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the hash function using as on 
argument the value of at least a part 
of one or more predetermined columns 

among those for the grouping of such 
records, determining whether at least 
another record having the hash value 
identical to that of the hash value 
generated for each record exists or 
not from a plurality of hash values 
generated for plural records, and 
also determining a part of plurality 
of records determined respectively 
that there is no other records having 
the identical hash values by such 
determination process as the records 
having no possibility that the other 
records having the identical value of 
at least a part of the columns among 
those for the grouping do not exist, 
[Emphasis added.] 

Column 4, lines 45-60. More specifically, the Fujiwara 
patent states: 

For each record, a hash value is 
generated (4) by the hash function 
hasl( ) using, as an argument, the 
value of hash column. . . . When one 
column G is used for the standard of 
grouping, the hash column H is 
identical to one column. When a 
plurality of columns G are used for 
the standard of grouping, the hash 
column H is identical to one column 
in a plurality of columns G or 
combination of a plurality of 
columns. [Emphasis added.] 

Column 5, line 60 through column 6, line 3. 

As can be appreciated from the foregoing, the hash 
function is not used to determine a column in which an 
element of a document will belong as claimed. Rather, a 
column or columns of a database record (to be used for 
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the standard of grouping) are hashed to generate a hash 
value for the record. Thus, even if the purported 
teaching of the Fujiwara patent were to be combined with 
the Judd patent, the combination still would neither 
teach, nor suggest, a data structure wherein a hash 
function is used to hash each of the elements to 
determine which one of the plurality of lists that each 
of the elements will he contained in. 

Further, one skilled in the art would not have been 
motivated to combine these references as proposed by the 
Examiner. Specifically, in the Judd patent, in the 
document index, each record may include a document 
identifier and information extracted from the document. 
If it is the Examiner's position that a hash function is 
somehow to be used to determine what document identifier 
hash document information should belong to, the 
applicants respectfully submit that this would likely 
destroy functionality of the document index. If it is 
the Examiner's position that a hash function is somehow 
to be used to determine what document property (e.g., 
title, summary, etc.) an extracted value should be 
associated with, the applicants respectfully submit that 
one skilled in the art would not have been motivated to 
make this modification since doing so would clearly not 
make sense. Finally, the applicants fail to see how the 
modification proposed by the Examiner would reduce or 
remove duplicate documents as alleged. 

Accordingly, independent claims 48 and 67 are not 
rendered obvious by the Judd and Fujiwara patents for at 
least the foregoing reasons. Since claims 55-57 depend 
from claim 48, these claims are similarly not rendered 
obvious . 
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Also, dependent claim 56 further recites that each 
of the elements of a document is a predetermined one of 
(A) a predetermined number of words, (B) a predetermined 
number of sentences, (C) a predetermined number of 
. characters, (D) a predetermined number of paragraphs, and 
(E) a predetermined number of sections. The Examiner 
alleges that a document title or document summary teaches 
an element defined by a predetermined number of words or 
sentences. {See Paper No. 20061013, page 12.) Since, 
however, document titles and summaries are not confined 
to a predetermined number of words or sentences, the 
applicants respectfully submit that claim 56 further 
defines the claimed invention over the Judd and Fujiwara 
patents. 

Also, dependent claim 57 further recites that each 
of the elements of a document partially overlaps another 
of the elements of the document. The Examiner alleges 
that column 7, lines 47-50 of the Judd patent teaches 
this feature. (See Paper No. 20061013, page 13.) The 
applicants respectfully request that the Examiner clarify 
her position since the cited section simply refers to 
document elements that may be provided in a document 
index . 

Claims 50-54, 58-60, 61-63 and 64-66 are rejected 
under 35 U.S. C. § 103(a) as being unpatentable over U.S. 
Patent No. 6,873,982 (*the Bates patent"), in view of the 
Johnson patent. The applicants respectfully request that 
the Examiner reconsider and withdraw this ground of 
rejection in view of the following. 

First, even assuming, arguendo, that one skilled in 
the art would have been motivated to combine these 
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references as proposed by the Examiner, the proposed 
combination neither teaches, nor suggests, a plurality of 
records, each of the records comprising (a) a first field 
for storing a document identifier; and (b) a plurality of 
lists, each of the plurality of lists containing elements 
of a document identified by the document identifier 
stored in the first field, wherein at least some of the 
plurality of lists include different numbers of elements, 
or wherein at least some contiguous elements in a 
document are not contiguous elements of a list, or 
wherein for each of the records, the number of lists is 
the same. 

Specifically, as noted in the previous response, 
assuming, arguendo, that each of the keyword fields 106 
of the Bates patent could be characterized as the claimed 
"lists", each of the keyword fields 106 does not contain 
elements of the document. Instead, each of the keyword 
fields 106 includes a single ward of the document. 
Indeed, the Examiner now concedes this point. (See Paper 
No. 20061013, page 13.) To compensate for this 
deficiency of the Bates patent, the Examiner contends 
that the Johnson patent teaches a plurality of lists each 
containing elements of a document. However, as discussed 
above, the Johnson patent does not teach a plurality of 
listB, each of the plurality of lists containing elements 
of a document identified by the document identifier 
stored in the first field. Rather, if it is the 
Examiner's position that the columns of the table 450 in 
the Johnson patent correspond to the claimed list, then 
each of these lists does not contain elements from a 
document. Rather, they contain elements from different 
documents . 
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Thus, independent claims 50, 52 and 53 are not 
rendered obvious by the Bates and Johnson patents for at 
least this reason. Since claims 51 and 58-60 depend from 
claim 50 , since claims 61-63 depend from claim 52 and 
since claims 54 and 64-66 depend from claim 53, these 
claims are similarly not rendered obvious by the Bates 
and Johnson patents. 

Further, the Examiner has not demonstrated that one 
skilled in the art would have been motivated to combine 
the Bates and Johnson patents. Thus, the Examiner has 
not made a prima facie showing of obviousness. More 
specifically, the Examiner states: 

It would have been obvious to one 
having ordinary skill in the art at the 
time of the invention... to have each of 
the plurality of lists containing more 
than one element of a document as 
suggested by Johnson because the 
difference are [sic] only found in the 
nonfunctional descriptive material and 
do not alter how the elements of [the] 
system function. Thus this descriptive 
material will not distinguish the 
claimed invention from the prior art in 
terms of patentability. [Emphasis 
added . ] * 

Paper No. 20061013, page 14. The applicants respectfully 
submit that the Examiner has misapplied the cases cited. 

Specifically, in In re. Gulack , 217 U.S.P.Q. 401 
(Fed. Cir. 1983), the Court of Appeals reversed a 
rejection by the Board, finding that digits of Gulack' s 
invention were functionally related to a band which acted 
as a substrate. Id., at 404. In any event, the Federal 
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Circuit further instructed that a specific type of 
functional relationship is not required, stating: 

What is required is the existence of 
differences between the appealed claims 
and the prior art sufficient to 
establish patentability. The bare 
presence of absence of a specific 
functional relationship, without 
further analysis, is not dispositive of 
obviousness- Rather, the critical 
question is whether there exists any 
new and nonobvious functional 
relationship between the printed matter 
and the substrate. 

Id, In in re. Lowry , 32 U.S.P.Q.2d 1031 (Fed. Cir. 
1031), the Federal Circuit noted that Gulack cautioned 
against the liberal use of printed matter rejections 
under section 103, and stated: 

... the Board erroneously extended a 
printed matter rejection under sections 
102 and 103 to a new field in this 
case, which involves inf ormation stored 
in a memory. 

Id. , at 1034. The data structures claimed in Lowry were 
alleged to greatly facilitate data management by data 
processing systems, and the Federal Circuit reversed 102 
and 103 based grounds of rejection. Similarly, in the 
instant application, the claimed data structures include 
lists formed from elements of a document. Fingerprints 
generated from these lists allow duplicate documents to 
be found using less time, processing and memory. Thus, 
the claimed lists are not non-f unctional descriptive 
material. (If the Examiner's test were applied, any 
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stored data structure could be rendered unpatentable by 
any other stored data structure which would clearly be 
improper . ) 

Thus, independent claims 50, 52 and 53 are not 
rendered obvious by the Bates and Johnson patents for at 
least this additional reason. Since claims 51 and 58-60 
depend from claim 50, since claims 61-63 depend from 
claim 52 and since claims 54 and 64-66 depend from claim 
53, these claims are similarly not rendered obvious by 
the Bates and Johnson patents. 

Claims 48, 55, 56 and 67 are rejected under 35 
U.S.C. § 103(a) as being unpatentable over the Bates 
patent, in view of the Johnson patent, and further in 
view of the Fujiwara patent. The applicants respectfully 
request that the Examiner reconsider and withdraw this 
ground of rejection in view of the following. 

First, as just discussed above with reference to 
claims 50, 52 and 53, the Examiner has not demonstrated 
that one skilled in the art would have been motivated to 
combine the Bates and Johnson patents. Thus, the 
Examiner has not made a prima facie showing of 
obviousness. Consequently, independent claims 48 and 67 
are patentable for at least this reason. Since claims 55 
and 56 depend from claim 48, they are similarly 
patentable. 

Second, as discussed above, in the Fujiwara patent, 
the hash function Is not used to determine a column in 
which an element of a document will belong. Rather, a 
column or columns of a database record (to be used for 
the standard of grouping) are hashed to generate a hash 
value for the record. Thus, even if the purported 

-30- 
PAGE 30/37 * RCVD AT 3/23/2007 3:20:39 PM [Eastern Daylight Time] - 8VR:U8PTO-EFXRF-6/30 * DNI8:2738300 * CSID: 17325420071 • DURATION (mm-ss):00-38 



03/23/2007 14:45 FAX 17325429071 



BJ037/037 



teaching of the Fujiwara patent were to be combined with 
the Bates and Johnson patents, the combination still 
would neither teach, nor suggest, a data structure 
wherein a hash function is used to hash each of the 
elements to determine which one of the plurality of lists 
that each of the elements will he contained in. Thus, 
claims 48 and 67 are patentable for at least this reason. 
Since claims 55 and 56 depend from claim 48, they are 
similarly patentable. 

Conclusion 

In view of the foregoing amendments and remarks, the 
applicant respectfully submits that the pending claims 
are in condition for allowance. Accordingly, the 
applicants request that the Examiner pass this 
application to issue. 
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